mass spectrometer
Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing
Ma, Zheng, Mao, Zeping, Zhang, Ruixue, Chen, Jiazhen, Xin, Lei, Shan, Paul, Ghodsi, Ali, Li, Ming
Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established system DeepNovo-DIA by from 25% to 81%, averaging 48%, for amino acid recall, and by from 27% to 89%, averaging 57%, for peptide recall, by equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.
Prompt-Efficient Fine-Tuning for GPT-like Deep Models to Reduce Hallucination and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimisation Techniques
Large Language Models (LLMs) are increasingly adopted for complex scientific text generation tasks, yet they often suffer from limitations in accuracy, consistency, and hallucination control. This thesis introduces a Parameter-Efficient Fine-Tuning (PEFT) approach tailored for GPT-like models, aiming to mitigate hallucinations and enhance reproducibility, particularly in the computational domain of mass spectrometry. We implemented Low-Rank Adaptation (LoRA) adapters to refine GPT-2, termed MS-GPT, using a specialized corpus of mass spectrometry literature. Through novel evaluation methods applied to LLMs, including BLEU, ROUGE, and Perplexity scores, the fine-tuned MS-GPT model demonstrated superior text coherence and reproducibility compared to the baseline GPT-2, confirmed through statistical analysis with the Wilcoxon rank-sum test. Further, we propose a reproducibility metric based on cosine similarity of model outputs under controlled prompts, showcasing MS-GPT's enhanced stability. This research highlights PEFT's potential to optimize LLMs for scientific contexts, reducing computational costs while improving model reliability.
Machine learning meets mass spectrometry: a focused perspective
Boiko, Daniil A., Ananikov, Valentine P.
Mass spectrometry is a widely used method to study molecules and processes in medicine, life sciences, chemistry, catalysis, and industrial product quality control, among many other applications. One of the main features of some mass spectrometry techniques is the extensive level of characterization (especially when coupled with chromatography and ion mobility methods, or a part of tandem mass spectrometry experiment) and a large amount of generated data per measurement. Terabyte scales can be easily reached with mass spectrometry studies. Consequently, mass spectrometry has faced the challenge of a high level of data disappearance. Researchers often neglect and then altogether lose access to the rich information mass spectrometry experiments could provide. With the development of machine learning methods, the opportunity arises to unlock the potential of these data, enabling previously inaccessible discoveries. The present perspective highlights reevaluation of mass spectrometry data analysis in the new generation of methods and describes significant challenges in the field, particularly related to problems involving the use of electrospray ionization. We argue that further applications of machine learning raise new requirements for instrumentation (increasing throughput and information density, decreasing pricing, and making more automation-friendly software), and once met, the field may experience significant transformation.
Multiple Alignment of Continuous Time Series
Multiple realizations of continuous-valued time series from a stochastic process often contain systematic variations in rate and amplitude. To leverage the information contained in such noisy replicate sets, we need to align them in an appropriate way (for example, to allow the data to be properly combined by adaptive averaging). We present the Continuous Profile Model (CPM), a generative model in which each observed time series is a non-uniformly subsampled version of a single latent trace, to which local rescaling and additive noise are applied. After unsupervised training, the learned trace represents a canonical, high resolution fusion of all the replicates. As well, an alignment in time and scale of each observation to this trace can be found by inference in the model. We apply CPM to successfully align speech signals from multiple speakers and sets of Liquid Chromatography-Mass Spectrometry proteomic data. When observing multiple time series generated by a noisy, stochastic process, large sys- tematic sources of variability are often present.
History Of AI In 33 Breakthroughs: The First Expert System
In the early 1960s, computer scientist Ed Feigenbaum became interested in "creating models of the thinking processes of scientists, especially the processes of empirical induction by which hypotheses and theories were inferred from data." In April 1964, he met geneticist (and Noble-prize winner) Joshua Lederberg who told him how experienced chemists use their knowledge about how compounds tend to break up in a mass spectrometer to make guesses about a compound's structure. Recalling in 1987 the development of DENDRAL, the first expert system, Lederberg remarked: "…we were trying to invent AI, and in the process discovered an expert system. This shift of paradigm, 'that Knowledge IS Power' was explicated in our 1971 paper [On Generality and Problem Solving: A Case Study Using the DENDRAL Program], and has been the banner of the knowledge-based-system movement within AI research from that moment." Expert systems represented a new stage in the evolution of AI, shifting from its initial emphasis on general problem-solvers focused on expressing in code human reasoning, i.e., drawing inferences and arriving at logical conclusions.
NASA to Use Machine Learning to Enhance Search for Alien Life on Mars
Researchers at NASA have been hard at work on a pilot AI system intended to help future exploration missions find evidence of life on other planets in our solar system. Machine learning algorithms will help exploration devices analyze soil samples on Mars and return the most relevant data to NASA. The pilot program is currently slated for a test run during the ExoMars mission that will see its launch in mid-2022. As IEEE Spectrum reports, the decision to use machine learning and artificial intelligence to aid the search for life on other planets was driven largely by Erice Lyness, the head of the Goddard Planetary Environments Lab at NASA. Lyness needed to come up with ways of automating aspects of geochemical analyses of samples taken in other parts of our solar system.
News - Research in Germany
Using artificial intelligence, researchers at the Technical University of Munich (TUM) have succeeded in making the mass analysis of proteins from any organism significantly faster than before and almost error-free. This new approach is set to provoke a considerable change in the field of proteomics, as it can be applied in both basic and clinical research. The genome of any organism contains the blueprints for thousands of proteins which control almost all the functions of life. Defective proteins lead to serious diseases, such as cancer, diabetes or dementia. Therefore, proteins are also the most important targets for drugs.
Platform uses artificial intelligence to diagnose Zika and other pathogens
By Karina Toledo Agência FAPESP – A platform that can diagnose several diseases with a high degree of precision using metabolic markers found in patients' blood has been developed by scientists at the University of Campinas (UNICAMP) in Brazil. The method combines mass spectrometry, which can identify tens of thousands of molecules present in blood serum, with an artificial intelligence algorithm capable of finding patterns associated with diseases of viral, bacterial, fungal and even genetic origin. The results have been published in Frontiers in Bioengineering and Biotechnology. "We used infection by Zika virus as a model to develop the platform and showed that in this case, diagnostic accuracy exceeded 95%. One of the main advantages is that the method doesn't lose sensitivity even if the virus mutates," said Melo's supervisor Rodrigo Ramos Catharino, principal investigator for the project.
Platform Uses Artificial Intelligence to Diagnose Zika and Other Pathogens
A platform that can diagnose several diseases with a high degree of precision using metabolic markers found in patients' blood has been developed by scientists at the University of Campinas (UNICAMP) in Brazil. The method combines mass spectrometry, which can identify tens of thousands of molecules present in blood serum, with an artificial intelligence algorithm capable of finding patterns associated with diseases of viral, bacterial, fungal and even genetic origin. The results have been published in Frontiers in Bioengineering and Biotechnology. "We used infection by Zika virus as a model to develop the platform and showed that in this case, diagnostic accuracy exceeded 95%. One of the main advantages is that the method doesn't lose sensitivity even if the virus mutates," said Melo's supervisor Rodrigo Ramos Catharino, principal investigator for the project.
Inside Berg: the pharma startup fighting cancer with AI (Wired UK)
This article was first published in the April 2016 issue of WIRED magazine. Be the first to read WIRED's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online. In November 2013, more than 100 patients with cancer - including pancreatic, breast, liver and brain tumours - embarked on clinical trials involving BPM 31510, a drug discovered by an algorithm. The story of BPM 31510 begins with the extraction of biological data from healthy and cancerous tissue samples from over 1,000 patients. This data was then processed by artificial intelligence algorithms, which analysed it and suggested possible drug treatments. "We've essentially reversed the scientific method," says Niven R Narain, the 38-year-old president and co-founder of Berg, the Boston pharma startup which makes BPM 31510. "Instead of a preconceived hypothesis that leads us to do experiments and generate a particular type of data, we allowed the biological data from the patients to lead us to the hypotheses." Making an effective cancer-fighting drug is a notoriously difficult process: according to Narain, development and production can cost pharmaceutical companies up to 2.6 billion ( 1.8bn) and take 12 to 14 years to complete. "Only one per cent of the cancer drugs that make it to clinical trials prove to be effective. It's expensive and the development process is inexcusably long," Narain says.